3 Methods

3.1 Protein Isolation and Expression

3.1.1 Myosin V expression

Myosin V S1 is expressed (baculovirus system) with the first 792 amino acid residues which includes 1 IQ domain as detailed previously by the Yengo Lab (Gunther et al. 2020). Additionally, expressed myosin V contains the N-terminal tetracysteine motif, C-terminal Myc, and C-terminal FLAG tags (Trivedi et al. 2013, 2015, 2020; Gunther et al. 2019). The S217A mutation was introduced (serine to alanine) using QuikChange site-directed mutagenesis (Stratagene), co-expressed with calmodulin, and purified with FLAG affinity chromatography.

3.1.2 Skeletal muscle myosin II isolation

Fast skeletal muscle myosin II was isolated from chicken pectoralis muscle (Diemand Farm, Wendell, MA). All procedures during isolation were carried out in a cold room or performed on ice. Isolation was performed as previously described by the Debold lab (Woodward et al. 2020; Unger and Debold 2019; Longyear, Walcott, and Debold 2017) by Mike with a protocol similar to those of Margossian and Lowey (1982) with minor modifications. In short, chicken breast muscle is passed through a meat grinder and rinsed with 0.2M EDTA. 2 mL of Buffer A (Extraction buffer consisting of 0.3M KCl, 0.15M KPi, 20mM EDTA, 5mM MgCl2, 3.3mM ATP, and 5mM DTT at pH 6.7) is added per gram of tissue including 5mL of protease inhibitors. This is mixed for 12 minutes with an overhead stirrer. The reaction is stopped with a 4X dilution into water which is then mixed and filtered. After precipitate settles it is centrifuged at 10,800g for 10 minutes at 4C and the resulting pellet is resuspended with Buffer B (Suspension buffer consisting of 1M KCl, 60mM KPi, 20mM EDTA, and 5mM DTT at pH 6.7) and mixed gently before being left to dialyze overnight. Actomyosin is then precipitated and centrifuged at 41,171g for 1 hour at 4C and the resulting supernatant is diluted 10X with water. Clear supernatant is siphoned off and the rest is centrifuged again at 10,800g for 15 minutes at 4C. Supernatant is poured off and precipitate is resuspended with Buffer D (Resupsension buffer consisting of 3M KCl, 50mM KPi, and 5mM DTT at pH 6.7) before being dialized overnight for a second time against Buffer E (Dialysis buffer consisting of 0.6M KCl, 50mM KPi, 1mM NaN3, and 5mM DTT at pH 7.0). After the dialysis, myosin is clarified with an ultracentrifugation at for 2 hours at 4C, concentration determined, snap frozen with liquid nitrogen, and stored at -80C.

3.1.3 Actin isolation and labeling

Acetone powder was prepped from the remainder from the myosin isolation (with the leftovers from the filtered cheesecloth) and actin purification was performed from the resulting acetone powder as described by Pardee and Spudich (1982) with modifications. Briefly, acetone powder was finely ground and mixed with extraction buffer (2mM Tris Base, 0.2mM CaCl2, and 0.005% NaN3 at pH 8.0) and stirred with an effort to minimize creation of bubbles. The resulting solution is spun at 28960g for 20 minutes and supernatant filtered off and kept aside. Additional extraction buffer added to gel-like precipitate and centrifuged a second time with the same specs with the supernatant filtered off and combined with the previous. Actin is polymerized from the resulting supernatant by addition a final polymerization solution (50mM KCl, 2mM MgCl2, and 1mM ATP). Salt is added to slowly to prevent “salt shocking” the proteins before being left to stir overnight. The next day, a high salt wash (increase KCl to 600mM) removes tropomyosin from the f-actin and then then the sedimentation of f-actin performed by centrifuging at 205835g for 60 minutes. The precipitate is transferred to a homogenizer and resuspended with extraction buffer. A 4 day dialysis is performed with extraction buffer additionally containing ATP and DTT to de-polymerize actin. After dialysis the resulting G-actin is clarified with an ultracentrifugation at 200,000g for 60 minutes. After actin is polymerized by adding 10mM Imidazole (pH 7.0) and 1mM MgCl2 and dialyzed against final storage buffer (4mM Imidazole, 25mM KCl, 2mM MgCl2, 1mM NaN3, and 0.01mM ATP at pH 7.0). After calculation of final concentration actin is snap frozen in liquid nitrogen and stored at -80C. After isolation actin can then be labeled with 100% TRITC for use in vitro motility or mixed with a 50:50 TRITC/Biotin solution for use in the three-bead laser trap assay.

3.2 Laser trap assay

The laser trap assay was performed as previously described by the Debold Lab (Woodward et al. 2020; Unger and Debold 2019; Longyear, Walcott, and Debold 2017) with special considerations for the expressed myosin V. Single molecules of myosin were adhered to a nitrocellulose coated microscope slide containing 3µM glass pedestal beads with an additional coverslip glued on top for construction of a “flow-cell.” The final myosin concentration of ~0.8-1µg/mL was added after to introduction of anti-myC antibody (0.8µg/mL, Sigma Inc.) which provided a binding interface for the expressed myosin on the surface. Bovine Serum Albumin (BSA) was used to block the remainder of the surface before the addition of final buffer. The final buffer consisted of an actin buffer (91mM KCl, 1mM EGTA, 4mM MgCl2, and 1mM DTT at pH 7.0) mixed with 100µM ATP, and an oxygen scavenger system (29mM glucose, 1.5mM glucose oxidase, and 80 units catalase) at pH 7.0. For 30mM Pi experiments KCl was reduced in order to maintain the 125mM total ionic strength to match the control 0mM Pi experiments. The concentration of TRITC/Biotin labelled actin filaments and neutravadin/streptavidin coated 1 micron beads (Bangs Lab Inc) was varied at trappers discretion. Bead-actin-bead “dumbbell” setups were constructed using a three axis piezo controlled state (Mad City Labs) with a time shared laser trap between two positions. Experiments were performed at 1.5 Watts laser power and actin filaments pretension to 3-4pN. The resulting system stiffness of the two laser traps and the pretension across the filaments was 0.04pN/nm, determined via the equipartition method (DUPUIS et al. 1997). Bead position was tracked using a four quadrant photodiode with a sampling rate of 5kHz.

3.3 Ultra-fast force clamp

The UFFC data was collected under the same experimental conditions and buffers as described in the section 3.2 for the standard laser trap assay, but full length fast skeletal muscle myosin IIx was adhered to the nitrocellulose coated coverslips instead of the myosin V and anti-body combination. Since UFFC is an extension of the standard three-bead assay implementation was similar in regards to setting up a dumbbell. However, in order to increase the speed that the dumbbell was able to move through solution when the force clamp was engaged smaller 510nm beads (Bangs Lab) were coated with neutravidin (31000, Pierce) for use in the UFFC. Decreasing the radius of the beads reduces the Stoke’s drag coefficient since \(\beta=6 \pi \eta r\), where \(\beta\) represents the drag coefficient, \(\eta\) the viscosity of solution, and r is the radius of the bead. Furthermore, the drag coefficient is inversely proportional to the velocity of the dumbbell setup as given by \(velocity = F_{total}/\beta\). The smaller the drag the faster the dumbbell can move. Consequently, the increased velocity of the dumbbell setup in the un-attached state increases the time resolution and signal-to-noise ratio of the resulting data. Force was pre-determined for each experimental condition and set by calculating the bead’s displacement from the center of the trap and converted to a force since \(F = -kx\), where \(x\) is the bead’s displacement and \(k\) is the stiffness of the trap. The trap stiffness was similar to the standard laser trap experiments at roughly 0.04 pN/nm calculated by analysis of the power spectrum. Positions of the traps were controlled through the AODs and a custom LabView program. Bead position was collected at 200kHz sampling frequency. The total feedback delay was around 8;micro:s, this is the total time it takes for the computers to detect changes in the beads position relative to the center of the trap, communicate the information, and for the AODs to respond to the changes.

Actomyosin interactions in UFFC are identified by applying a threshold the velocity transformation of the raw displacement data that is generated during data collection. The velocity was calculated instantaneously on a point-to-point basis and the results smooth with a Gaussian filter. The velocity transformation results in a double Gaussian distribution with two peaks corresponding to the average velocity of the bound and unbound populations. The bound population velocity is centered around zero because as myosin binds actin and imposes its own stiffness/drag the UFFC will feedback in attempts to apply a consistent force to the dumbbell causing the traps to stop moving. The threshold is set at the point along the joint PDF of the double Gaussian where the probability of crossing the threshold due to noise from the unbound or bound event is equivalent. The threshold was then optimized for each record to decrease the number of false events detected to <1%. If false events exceeded 1%, the SD of the Gaussian filter was increased to further smooth out the data in order to decrease the probability of a baseline noise artifact crossing the threshold. Note that usually smaller SD of the gaussian filters could be applied at great forces due to an increase in force subsequently increasing the signal-to-noise ratio since the baseline velocity is faster to achieve higher forces. Additionally, there is a correction factor that is applied to determine the start and end of the events that is a result of the optimal threshold being closer to the peak of the bound population in the velocity distribution that is a result of the bound population having a narrower peak (smaller SD) since myosin stiffness is greater than the trap stiffness. After event identification, events were ensembled averaged as by synchronizing events in the x and y dimensions by applying linear regressions to the baseline prior to the start of the event (when the bead is moving at constant velocity) and overlaying events at the point where the linear regression intersected the start of the event as ID’d prior in analysis. The resulting ensembles could be fit with a model consisting of a linear portion that described the delay before the powerstroke and a double exponential reflecting the kinetics/mechanics of myosin going through the first and second powerstroke. Ensemble averages are typically calculated separately for the three populations of event durations that are prevelant in UFFC experiments (short, intermediate, and long) since these events represent different mechanochemical schemes of an actomyosin interaction.

3.4 Step-by-step single molecule event identification

This is a “step-by-step” walk through of the Hidden-Markov/Changepoint Analysis we use to analyze our single molecule laser trap data and includes everything on the journey from raw data to analyzed trace and everything on the way…buckle up. This section is a lot cooler if you are reading online and includes the R code to reproduce this by hand. You can download the data here

3.4.1 Raw data

Here is a raw data trace. This is unprocessed data as-is from the trap computer. The data is relative position of the bead in mV over time:

Figure 3.1: Raw trap data.

The data record is 89.1576 seconds long and has an average position of 6.2968527 mV.

3.4.2 Processed Data

The first step of the analysis is removing the “baseline” by centering the mean around 0. This can either be done by simply subtracting the baseline mean from every data point or by performing a piecewise linear detrend on the whole record. The latter accomplishes two things: 1) Centers mean around 0 and 2) removes any drift (i.e. wander correction). Additionally, in the lasertrapr app you can find the average baseline position by using a mean variance transformation of the data to select the baseline population or by selecting a quiescent period of the data where no binding events occur to calculate mean baseline position. Here we will detrend the data and convert from mV to nm using a “Step Calibration.” The step calibration is performed by moving the microscope stage a known distance, say 200 nanometers, and measuring the resulting change in the mV signal. We then can estimate the number of nanometers per mV.

Example of a step calibration. Stage was moved 200nm

Figure 3.2: Example of a step calibration. Stage was moved 200nm

The mV-to-nm conversion calibrated around the time this data trace collecting we are analyzing know was 30 nm/mV. We can convert our raw data from mV to nm and detrend the data (or visa versa).

Figure 3.3: Example of processed trap data

3.4.3 Running Mean & Variance

Running Mean & Variance

The next step is to transform the data for the HM-Model by calculating both the running mean and variance of the data trace. This analysis uses a 150 datapoint window that advances by sliding 75 data points each time (half the window width). This is done to decrease the correlation between neighboring windows:

Figure 3.4: Interactive graphs of the running mean and variance

Additionally, we can plot these datasets against each other to see the mean vs. variance for each window:

Mean/Variance Plot

Figure 3.5: Mean/Variance Plot

3.4.4 Hidden-Markov Model

The data is now ready to have events identified with a Hidden-Markov Model. We first need to initialize the model with guesses of the initial state probabilities and the transition probabilities for our 2-state model. State 1 is when myosin is unattached from actin and state 2 is attached. We need to guess 6 different numbers.

  1. Probability of initially starting in state 1
  2. Probability of initially starting in state 2
  3. Transition probability from going from state 1 to state 1
  4. Transition probability from going from state 1 to state 2
  5. Transition probability from going from state 2 to state 1
  6. Transition probability from going from state 2 to state 2

I prep the data so it usually always start with baseline (i.e. state 1) or will trim it so the trace does though 99.9% this just occurs so we will give guess that 98% probability of starting in State 1. Probability of starting in state 2 is then 1 - [Prob. S2] = -.02. I then assume that these are both stable states and that there is a high probability of tranisitioning from state 1 to state 1 or state 2 to state 2. By the same logic above the transition probabilites are guess and our 6 probabilities above are:

## [1] 0.98 0.02 0.98 0.02 0.02 0.98

We will also have to make guesses of the statistical characteristics of the 4 underlying Gaussian distributions (2 states for each the running mean and running variance). To do this we will estimate the mean and standard deviation of each of the Gaussian. The 8 numbers that follow are:

  1. Guess for the mean of the variance for State 1. -calculated by taking the mean of the running variance

  2. Guess for the sd of variance for State 1 -calculated by the taking the sd of the running variance

  3. Guess for the mean of mean for State 1 -hard coding to 0 because we centered baseline around 0 when we processed the data.

  4. Guess for the sd of mean for State 1 -calculated by taking the sd of running mean

  5. Guess for the mean of variance for State 2 -estimated as half the value as state 1 (signal-to-noise 2:1)

  6. Guess for the sd of variance for State 2 -#2/2 because of stiffer system when myosin attached

  7. Guess for the mean of mean for State 2 -hard coded at 5nm (estimated size of the powerstroke)

  8. Guess for the sd of mean for State 2 -coded as twice #4 since there will be positive and negative displacements

## [1] 101.69  45.56   0.00   6.47  50.85  22.78   5.00   6.47

Once we estimate the starting point the model can be fit. The HM-Model will optimize all the parameters we just defined using the Expectation-Maximization (EM) Algorithm. The resulting model summary (Re1. = “Response 1” and is the variance signal while Re2. = “Response 2” and is the mean signal):

## converged at iteration 12 with logLik: -47137.21
## Initial state probabilities model 
## pr1 pr2 
##   1   0 
## 
## Transition matrix 
##         toS1  toS2
## fromS1 0.974 0.026
## fromS2 0.180 0.820
## 
## Response parameters 
## Resp 1 : gaussian 
## Resp 2 : gaussian 
##     Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
## St1         113.302 36.224          -0.758  3.950
## St2          22.375  8.612           5.185 13.816

Starting from the first line in the summary we can see that the model gives a 100% probability that this record starts in baseline (state 1). The transition matrix is straightforward to read and these probabilities could also be used as estimates of on/off rates although currently we have not been using these. Lastly, the “response parameters” are the optimized characteristics of the Gaussian distributions describing each of the state 1 and state 2 normal distribution for both the mean and variance signal. The results are that the variance distribution for state 1 has a mean of 113 (SD of 36) and state 2 has a variance of 22 (SD of 8). This gives a signal to noise of a little more than 5:1 in the variance signal. Furthermore, the baseline mean has a mean of -0.7 (SD of 4) and state 2 mean is 5.1 (SD of 13).

Now knowing these parameters we can extract the most probable state sequence through the trace via the Viterbi Alogorithm. Here is a table of the posterior states (columns are state = viterbi state, S1 & S2 are the delta probabilities of being in each state):

state S1 S2
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0

We can now construct another Mean/Variance plot using the running window transformation and this time color code the windows by State:

Mean/Variance Plot colored by State

Figure 3.6: Mean/Variance Plot colored by State

At this point I think we have completed the hardest part of laser trap analysis - picking events. Now we need to find the the peak displacements (and subsequently forces), time off, and time ons for each event. I call this ‘measuring events’ (because in reality I feel like these could be simple measurements you could make with a ruler).

3.4.5 Measure Events

The HM-Model state sequence decoding assigns every running window a 1 or a 2 describing the state that the window most likely belongs to. So all we are given is a long list of a 1-2 indicator that is the same length as our running mean/variances.

We need to calculate the number of 1’s and 2’s that occur in a row and then calculate the cumulative sum of these ‘in-a-row’ counts to get the indices of when the events start/stop in running window time. Here is the number of 1’s and 2’s that occur in a row.

lengths values
8 1
12 2
16 1
10 2
5 1
12 2

The table can be read that the raw trace starts with 8 windows of state 1 baseline noise followed by 12 windows of a state 2 event etc.

On times in milliseconds can be estimated by taking those state 2 ‘in-a-row’ lengths and converting them to 5kHz time and then to ms. The conversion from window time to 5kHz can be calculated by the dividing the length of the raw/processed data by the length of the resulting running mean/var calculations:

## The conversion between raw data and running windows is  75

This also works out to be the advancing window sliding distance in the running mean/var calculations. Estimating on-times is then straightforward. Multiply state 2 ‘in-a-row’ length by the conversion and then divide by sampling frequency (5000) to get into seconds and multiplied by 1000 for ms.

The same idea can be applied to the state 1 baseline to get the off times. In the process the first and last off-times are excluded because in reality we do not know when the last event before our first recorded event actually occurred. Additionally, the last baseline/state 1 ends because we stop recording so that is also not a true measure. Here is the table of on/off times:

n_event hmm_state num_windows length_5kHz time_on_ms time_off_ms
1 2 12 900.27869 180.05574 NA
2 2 10 750.23225 150.04645 240.07432
3 2 12 900.27869 180.05574 75.02322
4 2 11 825.25547 165.05109 180.05574
5 2 1 75.02322 15.00464 630.19509
6 2 27 2025.62706 405.12541 120.03716

In this table the time_off_ms column refers to the off time that occurred prior to the event.

Moving along and to make this information more helpful in being able to really ID where the events are stopping and starting we can take the cumulative sum of these ‘in-a-row’ lengths that will give us the running window indices of the start/stop of the events. This will help us chunk out the events to measure step sizes. After calculating the cumulative sum of the ‘in-a-row’ lengths we can make a new table with 2 columns. One column state_1_end indicating the window which is the last window in a series of baseline and a state_2_end column that is the last window of an event series.

state_1_end state_2_end
8 20
36 46
51 63
75 86
128 129
137 164

The result is the running window indices of the window before the event starts and the window that ends the event. By adding one (+1) to the state_1_end value we get the index of the start of each event.

## So the first event is between indices 9 and 20

We can estimate the step size from the running windowed data. To get the step size we find the position of the running window with the greatest absolute value and take its real value. Finding the index of the window with the greatest absolute value let’s us also find the peak of negative events more accurately than just taking the max right off the bat.

Now this would give us step size estimates since we already processed the data to be centered around 0. However, our baseline signal does not always return to 0 after an event. To get a more precise estimate of our step sizes by the same logic we can calculate the average of the baseline noise prior to each event and then subtract the step size estimate from the baseline prior to the event.

This results in a table giving the mean of the baseline before each event (avg_s1), the estimated step size (avg_s2), and the differences between the 2 (diff) representing the final step size the program reports for each event.

avg_s1 avg_s2 diff
-9.2145628 -12.734709 -3.520146
-6.9028608 -3.127884 3.774977
-0.6201406 18.654526 19.274667
-4.1346721 14.146184 18.280856
-4.5201374 17.046019 21.566157
-0.9125085 21.452987 22.365496

3.4.6 Direction Correction

Admittedly, sometimes the actin filament is oriented in the wrong direction. To compensate for backwards filaments the program has a so-called ‘direction-correction’ and will auto-magically flip the raw trace if the filament was oriented the wrong way. What happens here is after analyzing step sizes if the program identified more negative events than positive events it assumes backward-filament orientation in the trap and flips the trace horizontally over the x-axis by multiplying every value by negative one (-1). Now if the records were flipped so were the calculated step sizes and these new values reported along with the force measures that are calculated by the step size multiplied by the user-inputted nm -> pN conversion. These values are added to the on/off time table previously calculated.

3.4.7 Changepoint Analysis

Since we identified the events we can also perform ensemble averaging with a little more work. Unfortuntely, preparing the data for the HM-Model by transforming into running mean/variances decreased our time resoultion as we lost a lot of information. Our original number of data points collected is 445788, whereas the number of data points (windows) in the running mean/var is 5942.

The resulting start/end of each events are really just estimates. To get better start of event estimates the program uses changepoint analysis on the transition periods into and out of each event to better estiamte the start of each event in the original sampling frequency.

To obtain the transition periods from raw data in 5kHz time the running window indices from the state_1_ends and state_2_ends are converted to 5kHz time by the conversion ration of ~75 we previously calculated as the ratio between the number of datapoints in the raw data to the data points in the running mean/var calculations.

More specifically, the program finds the running window index of ~1.5 windows back into the baseline from the state_1_end window and the index of the to the first state 2 window and converts back into 5kHz time to supply a slightly larger transition window to analyze for the ‘true’ event start. Doing this for every event we obtain a new table with 5kHz time indices that should contain the transition into every event:

The changepoint analysis is actually performed on a new running variance of the processed data with a variable length window width. Here I use 5 datapoints (1ms) because the trace has exceptional singal-to-noise. This running transformation advances one point at a time. The resulting running variance transform looks like this:

Figure 3.7: 1ms running variance transformation

We can then chunk out the the transition periods with our indices and apply changepoint analysis to each transition period that looked for the change in mean of the variance signal for every event and plot the results. The changepoint analysis looks for a change in mean of the variance signal. The analysis only looks for a single changepoint. Note, this is an early version of the changepoint analysis that lasertrapr used. See the source code of the app on GitHub for up to date version.

Changepoint identified start of event in running variance

Figure 3.8: Changepoint identified start of event in running variance

We can also plot the corresponding point in the raw/processed data that is ultimately used in the ensemble average:

Changepoint identified start of event in the original data set

Figure 3.9: Changepoint identified start of event in the original data set

This same approach can be applied to the backside. Once backside change points are identified than more precise measurements of time on, time off, displacements, and forces can be estimated. For now, we can go ahead and make one final plot showing the complete analysis overlayed on the raw/processed data. The pink shades indicates the start/end of the event. The vertical dashed lines are placed at the peak displacement of each event and labeled with the step size and event duration. No analysis is perfect, some shorter events are missed. These are usually running windows which have a high variance due to the window overlapping baseline and an event.

Figure 3.10: Results of analysis overlayed on data trace